A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method

نویسندگان

چکیده

Traditional automatic pavement distress detection methods using convolutional neural networks (CNNs) require a great deal of time and resources for computing are poor in terms interpretability. Therefore, inspired by the successful application Transformer architecture natural language processing (NLP) tasks, novel method called LeViT was introduced asphalt image classification. consists layers, transformer stages where Multi-layer Perception (MLP) multi-head self-attention blocks alternate residual connection, two classifier heads. To conduct proposed methods, three different sources datasets pre-trained weights based on ImageNet were attained. The performance model compared with six state-of-the-art (SOTA) deep learning models. All them trained transfer strategy. Compared to tested SOTA has less than 1/8 parameters original Vision (ViT) 1/2 ResNet InceptionNet. Experimental results show that after training 100 epochs 16 batch-size, acquired 91.56% accuracy, 91.72% precision, recall, 91.45% F1-score Chinese dataset 99.17% 99.19% German dataset, which is best among all Moreover, it shows superiority inference speed (86 ms/step), approximately 25% ViT 80% some prevailing CNN-based models, including DenseNet, VGG, ResNet. Overall, can achieve competitive fewer computation costs. In addition, visualization combining Grad-CAM Attention Rollout analyze classification explore what been learned every MLP attention block LeViT, improved interpretability model.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Lightweight Inference Method for Image Classification

We demonstrate a two phase classification method, first of individual pixels, then of fixed regions of pixels for scene classification—the task of assigning posteriors that characterize an entire image. This can be realized with a probabilistic graphical model (PGM), without the characteristic segmentation and aggregation tasks characteristic of visual object recognition. Instead the spatial as...

متن کامل

Analogical Inference in Automatic Interpretation

We present findings suggesting that analogical inference can play a role in the fundamental processes involved in automatic comprehension and interpretation. Participants were found to use information from a prior relationally similar example in understanding the content of a currently encoded example. Further, in doing so they were sensitive to structural mappings between the two instances, ru...

متن کامل

AN-EUL method for automatic interpretation of potential field data in unexploded ordnances (UXO) detection

We have applied an automatic interpretation method of potential data called AN-EUL in unexploded ordnance (UXO) prospective which is indeed a combination of the analytic signal and the Euler deconvolution approaches. The method can be applied for both magnetic and gravity data as well for gradient surveys based upon the concept of the structural index (SI) of a potential anomaly which is relate...

متن کامل

Visual Image Interpretation

Often the first step in a remote sensing change detection study investigating coastal dynamics is to delineate the actual coastline from theavailable images. This can often be a difficult task, partly because of the uncertainty over what is and is not the coastline. Whilst it may appear to be common sense to use the water line (where sea meets land) as the best indicator it may not be that easy...

متن کامل

A Fast, Robust, Automatic Blink Detector

Introduction “Blink” is defined as closing and opening of the eyes in a small duration of time. In this study, we aimed to introduce a fast, robust, vision-based approach for blink detection. Materials and Methods This approach consists of two steps. In the first step, the subject’s face is localized every second and with the first blink, the system detects the eye’s location and creates an ope...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Remote Sensing

سال: 2022

ISSN: ['2315-4632', '2315-4675']

DOI: https://doi.org/10.3390/rs14081877